XML Retrieval Using Pruned Element-Index Files

نویسندگان

  • Ismail Sengör Altingövde
  • Duygu Atilgan
  • Özgür Ulusoy
چکیده

An element-index is a crucial mechanism for supporting content-only (CO) queries over XML collections. A full element-index that indexes each element along with the content of its descendants involves a high redundancy and reduces query processing efficiency. A direct index, on the other hand, only indexes the content that is directly under each element and disregards the descendants. This results in a smaller index, but possibly in return to some reduction in system effectiveness. In this paper, we propose using static index pruning techniques for obtaining more compact index files that can still result in comparable retrieval performance to that of a full index. We also compare the retrieval performance of these pruning based approaches to some other strategies that make use of a direct element-index. Our experiments conducted along with the lines of INEX evaluation framework reveal that pruned index files yield comparable to or even better retrieval performance than the full index and direct index, for several tasks in the ad hoc track.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction of XML Files using Searching and Indexing Techniques

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of...

متن کامل

A structure- and content-based multimedia information retrieval system for XML documents

Because the number of XML documents is dramatically increasing, we need to develop a multimedia information retrieval system which can support both the retrieval based on document structure and the retrieval based on image content. In order to support the structure-based retrieval, we design keyword, structure, element, and attribute index structures by indexing XML documents based on the basic...

متن کامل

A DTD-Syntax-Tree Based XML file Modularization Browsing Technique

First, by using the current mature HTML information retrieval techniques, an XML information retrieval system framework will be given in this paper. Then, a DTD-tree based XML file modularization browsing technique will be introduced to browse the retrieval result (a list of XML URLs). Compared with the current XML retrieval systems, our new system has the following advantages: 1) It can retrie...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

Structure- and Content-Based Retrieval for XML Documents

Copyright © 2001, Idea Group Publishing. ABSTRACT As the number of XML documents is dramatically increasing, it is necessary to develop an XML document retrieval system that can support both structurebased retrieval and content-based retrieval. In order to support the structurebased retrieval, we design four efficient index structures, i.e., keyword, structure, element and attribute index, by i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010